AITopics | protein sequence design

Collaborating Authors

protein sequence design

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation

Wang, Fiona Y., Lee, Di Sheng, Kaplan, David L., Buehler, Markus J.

arXiv.org Artificial IntelligenceDec-1-2025

Designing proteins de novo with tailored structural, physicochemical, and functional properties remains a grand challenge in biotechnology, medicine, and materials science, due to the vastness of sequence space and the complex coupling between sequence, structure, and function. Current state-of-the-art generative methods, such as protein language models (PLMs) and diffusion-based architectures, often require extensive fine-tuning, task-specific data, or model reconfiguration to support objective-directed design, thereby limiting their flexibility and scalability. To overcome these limitations, we present a decentralized, agent-based framework inspired by swarm intelligence for de novo protein design. In this approach, multiple large language model (LLM) agents operate in parallel, each assigned to a specific residue position. These agents iteratively propose context-aware mutations by integrating design objectives, local neighborhood interactions, and memory and feedback from previous iterations. This position-wise, decentralized coordination enables emergent design of diverse, well-defined sequences without reliance on motif scaffolds or multiple sequence alignments, validated with experiments on proteins with alpha helix and coil structures. Through analyses of residue conservation, structure-based metrics, and sequence convergence and embeddings, we demonstrate that the framework exhibits emergent behaviors and effective navigation of the protein fitness landscape. Our method achieves efficient, objective-directed designs within a few GPU-hours and operates entirely without fine-tuning or specialized training, offering a generalizable and adaptable solution for protein design. Beyond proteins, the approach lays the groundwork for collective LLM-driven design across biomolecular systems and other scientific discovery tasks.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2511.22311

Country: North America > United States > Massachusetts > Middlesex County (0.28)

Genre: Research Report (0.83)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

Add feedback

Improving Protein Sequence Design through Designability Preference Optimization

Xue, Fanglei, Kubaney, Andrew, Guo, Zhichun, Min, Joseph K., Liu, Ge, Yang, Yi, Baker, David

arXiv.org Artificial IntelligenceJun-3-2025

Protein sequence design methods have demonstrated strong performance in sequence generation for de novo protein design. However, as the training objective was sequence recovery, it does not guarantee designability--the likelihood that a designed sequence folds into the desired structure. To bridge this gap, we redefine the training objective by steering sequence generation toward high designability. To do this, we integrate Direct Preference Optimization (DPO), using AlphaFold pLDDT scores as the preference signal, which significantly improves the in silico design success rate. To further refine sequence generation at a finer, residue-level granularity, we introduce Residue-level Designability Preference Optimization (ResiDPO), which applies residue-level structural rewards and decouples optimization across residues. This enables direct improvement in designability while preserving regions that already perform well. Using a curated dataset with residue-level annotations, we fine-tune LigandMPNN with ResiDPO to obtain EnhancedMPNN, which achieves a nearly 3-fold increase in in silico design success rate (from 6.56% to 17.57%) on a challenging enzyme design benchmark.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.00297

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

ProtFlow: Fast Protein Sequence Design via Flow Matching on Compressed Protein Language Model Embeddings

Kong, Zitai, Zhu, Yiheng, Xu, Yinlong, Zhou, Hanjing, Yin, Mingzhe, Wu, Jialu, Xu, Hongxia, Hsieh, Chang-Yu, Hou, Tingjun, Wu, Jian

arXiv.org Artificial IntelligenceApr-16-2025

The design of protein sequences with desired functionalities is a fundamental task in protein engineering. Deep generative methods, such as autoregressive models and diffusion models, have greatly accelerated the discovery of novel protein sequences. However, these methods mainly focus on local or shallow residual semantics and suffer from low inference efficiency, large modeling space and high training cost. To address these challenges, we introduce ProtFlow, a fast flow matching-based protein sequence design framework that operates on embeddings derived from semantically meaningful latent space of protein language models. By compressing and smoothing the latent space, ProtFlow enhances performance while training on limited computational resources. Leveraging reflow techniques, ProtFlow enables high-quality single-step sequence generation. Additionally, we develop a joint design pipeline for the design scene of multichain proteins. We evaluate ProtFlow across diverse protein design tasks, including general peptides and long-chain proteins, antimicrobial peptides, and antibodies. Experimental results demonstrate that ProtFlow outperforms task-specific methods in these applications, underscoring its potential and broad applicability in computational protein sequence design and analysis.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.10983

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Geometric deep learning for protein sequence design

AIHubSep-10-2024, 10:20:34 GMT

The geometric transformer samples the sequence space of the beta-lactamase TEM-1 enzyme (in grey) complexed a natural substrate (in cyan) to produce new well folded and active enzymes. Designing proteins that can perform specific functions involves understanding and manipulating their sequences and structures. This task is crucial for developing targeted treatments for diseases and creating enzymes for industrial applications. One of the grand challenges in protein engineering is designing proteins de novo, meaning from scratch, to tailor their properties for specific tasks. This has profound implications for biology, medicine, and materials science.

carbonara, protein sequence design, sequence, (11 more...)

AIHub

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Importance Weighted Expectation-Maximization for Protein Sequence Design

Song, Zhenqiao, Li, Lei

arXiv.org Artificial IntelligenceApr-30-2023

Designing protein sequences with desired biological function is crucial in biology and chemistry. Recent machine learning methods use a surrogate sequence-function model to replace the expensive wet-lab validation. How can we efficiently generate diverse and novel protein sequences with high fitness? In this paper, we propose IsEM-Pro, an approach to generate protein sequences towards a given fitness criterion. At its core, IsEM-Pro is a latent generative model, augmented by combinatorial structure features from a separately learned Markov random fields (MRFs). We develop an Monte Carlo Expectation-Maximization method (MCEM) to learn the model. During inference, sampling from its latent space enhances diversity while its MRFs features guide the exploration in high fitness regions. Experiments on eight protein sequence design tasks show that our IsEM-Pro outperforms the previous best methods by at least 55% on average fitness score and generates more diverse and novel protein sequences.

artificial intelligence, evolutionary algorithm, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2305.00386

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
(2 more...)

Add feedback

Protein Sequence Design with Batch Bayesian Optimisation

Zong, Chuanjiao

arXiv.org Artificial IntelligenceMar-18-2023

Protein sequence design is a challenging problem in protein engineering, which aims to discover novel proteins with useful biological functions. Directed evolution is a widely-used approach for protein sequence design, which mimics the evolution cycle in a laboratory environment and conducts an iterative protocol. However, the burden of laboratory experiments can be reduced by using machine learning approaches to build a surrogate model of the protein landscape and conducting in-silico population selection through model-based fitness prediction. In this paper, we propose a new method based on Batch Bayesian Optimization (Batch BO), a well-established optimization method, for protein sequence design. By incorporating Batch BO into the directed evolution process, our method is able to make more informed decisions about which sequences to select for artificial evolution, leading to improved performance and faster convergence. We evaluate our method on a suite of in-silico protein sequence design tasks and demonstrate substantial improvement over baseline algorithms.

artificial intelligence, machine learning, sequence, (14 more...)

arXiv.org Artificial Intelligence

2303.10429

Country: Europe > France (0.05)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

Add feedback

PDBench: Evaluating Computational Methods for Protein Sequence Design

#artificialintelligenceSep-22-2021, 02:09:05 GMT

Proteins perform critical processes in all living systems: converting solar energy into chemical energy, replicating DNA, as the basis of highly performant materials, sensing and much more. While an incredible range of functionality has been sampled in nature, it accounts for a tiny fraction of the possible protein universe. If we could tap into this pool of unexplored protein structures, we could search for novel proteins with useful properties that we could apply to tackle the environmental and medical challenges facing humanity. This is the purpose of protein design. Sequence design is an important aspect of protein design, and many successful methods to do this have been developed.

computational method, pdbench, protein sequence design, (2 more...)

#artificialintelligence

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.73)
Energy > Renewable > Solar (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback